Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indices options group wildcards #7

Closed
wants to merge 66 commits into from

Conversation

gmarouli
Copy link
Owner

This PR is a draft alternative of elastic#103518 given that this refactoring is accepted.

We believe that this refactoring, makes the option to add the failure store to the current IndicesOptions a possibility.

  • The new builders allow us to (temporarily) rebuild IndicesOptions with the failure store without having to extend the boolean factory methods.
  • Using the builders means that the default option of not including the failure store indices is applied everywhere.
  • It will make adding this to search easier because there are a lot of request rewrites that we need to wire the DataStreamOptions while like this we can rely on the existing infrastructure.

I am starting to see potential here.

gmarouli and others added 30 commits January 23, 2024 11:28
)

The stream + collect operation on empty lists was causing 6% of all
allocations during document parsing. Lets have 0-alloc methods to check
for this dynamic fields before we do anything about them.
Even with Kahan summation, there is a tiny precisoin loss. We already fixed this before for GeoPoint by using the encode/decode to doc-values quantization in the final display results, and for CartesianPoint we fix by casting the coordinates to float for both the expected and actual results.
Using the Collections wrapper is less optimal than `new
ConcurrentHashMap<V, Boolean>().keySet(Boolean.TRUE);` solution. With
the collection wrapper, mod counts on the CHM are updated on NOOP add
calls. For contested concurrent noop updated this solution was
benchmarked to be more than 2x as fast and should outperform the
Collections wrapper in essentially all cases (not just for the mod count
update reasons, it also has fewer virtual calls).
Moves some of the detail about S3 storage classes to their own section
for easier linking, and adds a note about `intelligent_tiering` archive
classes.
Misc tidy-up following elastic#104394:

- This action only runs on the coordinating node, no need to define wire
  serialization for its request/response types.

- No need to subclass `ActionType`, nor to define how to receive
  responses from remote clusters.

- Moves to executing an `AbstractRunnable` to be sure to handle all
  failures (including threadpool rejections) properly.
elastic#104722)

When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes elastic#104699
* Starting cohere

* Making progress on cohere

* Filling out the embedding types

* Working cohere

* Fixing tests

* Removing rate limit error message

* Fixing a few comments

* Update docs/changelog/104559.yaml

* Addressing most feedback

* Using separate named writeables for byte results and floats

* Fixing a few comments and adding tests

* Fixing mutation issue

* Removing cohere service settings from named writeable registry
Lower the upper bound of large response size from
2 times of suggestedMaxAllocationSize to 1.5 so that it be sent over
with 3 messages. This is because the split threshold is 0.99 of
suggestedMaxAllocationSize.

Resolves: elastic#104728
…ards (elastic#104709)

Remove +1L that allows the third-smallest shard to be also allocated on the node
 in case it is only 1b bigger than second-smallest
`ActionType` represents an action which runs on the local node, there's
no need for implementations to define a `Reader<Response>`. This commit
removes the unused constructor argument.
gmarouli and others added 27 commits January 25, 2024 15:07
* Avoid eager task realization in esql qa projects
* Fix eager task realization in PomValidationPrecommitPlugin
* Make loadCsvSpecData task lazy created
* Fix test task reference
…elastic#104725)

Recently a user saw spurious delayed data warnings. These turned
out to be due to accidentally setting `summary_count_field` to a
field that was always zero. This meant that every document was
considered delayed.
Co-authored-by: Elastic Machine <[email protected]>
…04623)

* Functions E-Z

* Incorporate changes from elastic#103686

* More functions

* More functions

* Update docs/reference/esql/functions/floor.asciidoc

Co-authored-by: Liam Thompson <[email protected]>

* Update docs/reference/esql/functions/left.asciidoc

Co-authored-by: Liam Thompson <[email protected]>

* Apply suggestions from code review

Co-authored-by: Alexander Spies <[email protected]>

* Review feedback

* Fix geo_shape description

* Change 'colum'/'field' into 'expressions'

* Review feedback

* One more

---------

Co-authored-by: Liam Thompson <[email protected]>
Co-authored-by: Alexander Spies <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
A recent report shows that we can perform ESQL planning on the refresh 
thread pool after waiting for refreshes from search-idle shards. While
the planning process is generally lightweight, it may become expensive
at times. Therefore, we should fork off the refresh thread pool
immediately upon resuming ESQL execution. Another place where we should
fork off is after field_caps. I will look into that later.
* Add CO2 data for AWS ap-southeast-3 and me-central-1

* Update CO2 data for GCP including new regions

* Update CO2 data for Azure including new/changed regions

* Move provider data into CloudProviders.java

* Add NOTICE and LICENSE for the provider data

* Document CloudProviders' public functions

* Add cloudcarbonfootprint-(NOTICE|LICENSE).txt as ignoreFile

---------

Co-authored-by: Elastic Machine <[email protected]>
Build these more lazily avoiding putting them in an array and don't keep
an accidental reference to the aggregator itself.
On reflection is was probably a mistake to give each
`ChunkedRestResponseBody` a nontrivial lifecycle in elastic#99871. The
lifecycle really belongs to the whole containing `RestResponse`. This
commit moves it there.
…lastic#104547)

* Allow both string and datetime as the third and fourth inputs to auto_bucket

 Committer: Fang Xing <[email protected]>

* Allow both string and datetime as the third and fourth inputs to auto_bucket

* Allow both string and datetime as the third and fourth inputs to auto_bucket

* Allow both string and datetime as the third and fourth inputs to auto_bucket

* Allow both string and datetime as the third and fourth inputs to auto_bucket

* Allow both string and datetime as the third and fourth inputs to auto_bucket
data_stream/190_require_data_stream/Testing require_data_stream in bulk
requests

Awaiting fix from elastic#104774
This shouldn't actually change anything, as the format has not been modified recently. This simply marks 8500009 as used by 8.12.1 to help with patches on the 8.12 branch
Copy link

Documentation preview:

@gmarouli gmarouli closed this Jan 26, 2024
elasticmachine pushed a commit that referenced this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.